361 - RECITATION 1

REVIEW

Creating vector and Matrix in R

Vector is a basic data structure in R. It contains element of the same type. The data types can be logical, integer, double, character, complex or raw.

  • Vectors are generally created using the c() function.
  • A vector’s type can be checked with the typeof() function.
  • Another important property of a vector is its length. This is the number of elements in the vector and can be checked with the function length().

1. Create the vectors below.

a= (1,2,3,. . . ,28,29,30)

a<- c(1:30)
a
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30
a<- 1:30
a
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30
a <- seq(1, 30, by=1)
a
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30

b = (1,2,3,. . . ,10,1,2,3,. . . ,10,1,2,3,. . . ,10)

b<-rep(c(1:10), 3)
b
##  [1]  1  2  3  4  5  6  7  8  9 10  1  2  3  4  5  6  7  8  9 10  1  2  3  4  5
## [26]  6  7  8  9 10

c= (1,1,1, 2,2,2, 3,3,3, 4,4,4,…, 10,10,10)

c<-rep(c(1:10), each=3)
c
##  [1]  1  1  1  2  2  2  3  3  3  4  4  4  5  5  5  6  6  6  7  7  7  8  8  8  9
## [26]  9  9 10 10 10

d = (20,18,16,. . . ,6,4,2)

d<- seq(from = 20,to = 2, by = -2)
d
##  [1] 20 18 16 14 12 10  8  6  4  2

Matrix is a two dimensional data structure in R programming. Matrix is similar to vector but additionally contains the dimension attribute.

  • All attributes of an object can be checked with the attributes() function (dimension can be checked directly with the dim() function).

  • We can check if a variable is a matrix or not with the class() function.

  • Diag() shows the diagonal elements of a matrix.

  • det() gives us determinant of a matrix.

  • t() gives the transpose of a matrix.

  • solve() takes the inverse of a matrix.

  • Matrix multiplication is %*%

2. Create the matrix below.

  • 3 x 3 matrix with every entry equals to 2.
matrix1 <- matrix(2, ncol = 3, nrow = 3)
matrix1
##      [,1] [,2] [,3]
## [1,]    2    2    2
## [2,]    2    2    2
## [3,]    2    2    2
  • 6 x 6 identity matrix
matrix2 <- diag(6)
matrix2
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    1    0    0    0    0    0
## [2,]    0    1    0    0    0    0
## [3,]    0    0    1    0    0    0
## [4,]    0    0    0    1    0    0
## [5,]    0    0    0    0    1    0
## [6,]    0    0    0    0    0    1
  • 4 x 4 matrix X:

\[ X = \begin{bmatrix}5 & 1 & -3 & 7 \\ 3 & -2 & 0 & 1\\ 2 & 0 & 8 & 6\\ 7 & -1 & 0 & 1 \end{bmatrix} \]

X <- matrix(c(5,3,2,7,1,-2,0,-1,-3,0,8,0,7,1,6,1), ncol = 4, nrow=4)
X
##      [,1] [,2] [,3] [,4]
## [1,]    5    1   -3    7
## [2,]    3   -2    0    1
## [3,]    2    0    8    6
## [4,]    7   -1    0    1

Matrix Operations

A) Use the previously created matrix X.

  • Check dimension and class of X
class(X)
## [1] "matrix" "array"
dim(X)
## [1] 4 4
attributes(X) 
## $dim
## [1] 4 4
  • Compute determinant of matrix X.
det(X)
## [1] 800
  • Take transpose of X (Y = \(X^T\)).
Y <- t(X)
Y
##      [,1] [,2] [,3] [,4]
## [1,]    5    3    2    7
## [2,]    1   -2    0   -1
## [3,]   -3    0    8    0
## [4,]    7    1    6    1
  • Take inverse of X (Z = \(X^{-1}\)).
Z <- solve(X)
Z
##       [,1]    [,2]     [,3]   [,4]
## [1,] -0.01 -0.1025 -0.00375  0.195
## [2,]  0.04 -0.5900  0.01500  0.220
## [3,] -0.08 -0.0700  0.09500  0.060
## [4,]  0.11  0.1275  0.04125 -0.145
  • Multiply X and Z.
X %*% Z
##              [,1]          [,2]          [,3]          [,4]
## [1,] 1.000000e+00  0.000000e+00  0.000000e+00  2.220446e-16
## [2,] 5.551115e-17  1.000000e+00 -2.775558e-17 -8.326673e-17
## [3,] 1.110223e-16 -1.110223e-16  1.000000e+00  1.110223e-16
## [4,] 0.000000e+00 -5.551115e-17  0.000000e+00  1.000000e+00

B)Now suppose;

\[ A = \begin{bmatrix}1 & 2 & 5 \\ 2 & 3 & 6\\ -2 & 0 & 1\\ \end{bmatrix} \]

  • Create matrix A.
A <- matrix(c(1,2,5,2,3,6,-2,0,1), nrow = 3, byrow = TRUE)
A
##      [,1] [,2] [,3]
## [1,]    1    2    5
## [2,]    2    3    6
## [3,]   -2    0    1
  • Write the diagonal elements of matrix A in a vector B.
B <- diag(A)
B
## [1] 1 3 1
  • Create a new matrix C with \(1^{st}\) and \(3^{rd}\) rows and \(2^{nd}\) and \(3^{rd}\) columns of A.
C <- A[-2,2:3]
C
##      [,1] [,2]
## [1,]    2    5
## [2,]    0    1
  • Create a new matrix D by deleting 3rd row of A.
D <- A[-3,]
D
##      [,1] [,2] [,3]
## [1,]    1    2    5
## [2,]    2    3    6
  • Replace the third column of A by the sum of the second and third columns.
A[,3] <- A[,2] + A[,3]
A
##      [,1] [,2] [,3]
## [1,]    1    2    7
## [2,]    2    3    9
## [3,]   -2    0    1
  • Compute the mean of each column of A. (Without using loop)
colMeans(A)
## [1] 0.3333333 1.6666667 5.6666667
#column means
apply(A,2,mean)
## [1] 0.3333333 1.6666667 5.6666667
#row means
apply(A,1,mean)
## [1]  3.3333333  4.6666667 -0.3333333

Note That;

apply(X, MARGIN, FUN, …)

where:

  • X is an array or a matrix if the dimension of the array is 2;

  • MARGIN is a variable defining how the function is applied: when MARGIN=1, it applies over rows, whereas with MARGIN=2, it works over columns. Note that when you use the construct MARGIN=c(1,2), it applies to both rows and columns;

  • FUN, which is the function that you want to apply to the data. It can be any R function, including a User Defined Function (UDF).

Distribution functions

Every distribution has four associated functions whose prefix indicates the type of function and the suffix indicates the distribution. To exemplify the use of these functions, let use the normal (Gaussian) distribution. The four normal distribution functions are:

  • dnorm: density function of the normal distribution

The probability density function (PDF, in short: density) indicates the probability of observing a measurement with a specific value and thus the integral over the density is always 1.

  • pnorm: cumulative density function of the normal distribution

The cumulative density (CDF) function is a monotonically increasing function as it integrates over densities

  • qnorm: quantile function of the normal distribution

The quantile function is simply the inverse of the cumulative density function (iCDF). Thus, the quantile function maps from probabilities to values. Let’s take a look at the quantile function for 𝑃[𝑋<=𝑥]

  • rnorm: random sampling from the normal distribution

The random sampling function: rnorm

When we want to draw random samples from the normal distribution, you can use rnorm.

The full list of standard distributions available can be seen using “?distribution.”

?distribution

Random Number Generation

R has functions to generate a random number from many standard distribution like uniform distribution, binomial distribution, normal distribution etc.

  • Functions that generate random deviates start with the letter r. For example, runif() generates random numbers from a uniform distribution and rnorm() generates from a normal distribution.
runif(1)    # generates 1 random number from standard uniform(0,1)
## [1] 0.9532313
runif(3)    # generates 3 random number
## [1] 0.8682768 0.1794216 0.8603041
runif(3, min=5, max=10)    # define the range between 5 and 10
## [1] 6.938969 5.528239 5.411266

Assume X ~ Normal(5, 9); generate a random sample of size n = 1000. (While generating use set.seed(361) command.)

set.seed(361)
x <- rnorm(n = 1000, mean = 5, sd = 3)
head(x, 20)
##  [1]  6.0846793  7.1054138  3.4195848  5.2180070  5.1573151  1.5472171
##  [7]  8.2884016  4.8132810  6.0027304  8.4461702  3.2543662  2.7511491
## [13]  7.7312325  8.3522899  3.7435374  2.2771184 10.9100903 -0.7250312
## [19]  6.6288625  3.9624669

Compute following:

  • Minimum & Maximum value and range
min(x)
## [1] -4.204518
max(x)
## [1] 15.22712
range(x)
## [1] -4.204518 15.227121
  • Mean, Median & Variance
mean(x)
## [1] 5.054446
median(x)
## [1] 5.07124
var(x)
## [1] 9.990544
  • Order generated numbers in ascending order and print first and last 10.
sorted_x <- sort(x)
head(sorted_x,10)
##  [1] -4.204518 -3.981195 -3.927701 -3.636500 -3.482656 -3.229263 -3.183069
##  [8] -2.966523 -2.754209 -2.672590
tail(sorted_x,10)
##  [1] 12.16955 12.39004 13.09075 13.10228 13.24375 13.49768 13.63390 13.66403
##  [9] 14.74095 15.22712

Visualization

Plot the histogram and box plot of a vector that you generated in a single figure usning ggplot.

data_x<- as.data.frame(x) #to use ggplot, it is needed a dataframe
library(ggplot2)
h<-ggplot(data_x, aes(x=x)) + 
  geom_histogram(color="violet", fill="pink", bins = 30) +labs(title = "Histogram of normal distribution(n=1000)")


b<-ggplot(data_x, aes(x=x)) + 
  geom_boxplot(color="blue", fill="lightblue") +labs(title = "Boxplot of normal distribution(n=1000)")

library(ggpubr)
ggarrange(h,b)

set.seed(361)
y <- rnorm(n = 1000000, mean = 5, sd = 3)
head(y, 20)
##  [1]  6.0846793  7.1054138  3.4195848  5.2180070  5.1573151  1.5472171
##  [7]  8.2884016  4.8132810  6.0027304  8.4461702  3.2543662  2.7511491
## [13]  7.7312325  8.3522899  3.7435374  2.2771184 10.9100903 -0.7250312
## [19]  6.6288625  3.9624669
data_y<- as.data.frame(y) #to use ggplot, it is needed a dataframe
library(ggplot2)
h<-ggplot(data_y, aes(x=y)) + 
  geom_histogram(color="violet", fill="pink", bins = 30) +labs(title = "Histogram of normal distribution(n=1000000)")


b<-ggplot(data_y, aes(x=y)) + 
  geom_boxplot(color="blue", fill="lightblue") +labs(title = "Boxplot of normal distribution(n=1000000)")

library(ggpubr)
ggarrange(h,b)

As seen, as sample size increases, distribution of random variable looks more normal (more symmetrical)

TASK!

Generate a random sample of size 5000 from gamma distribution with shape parameter 2 and scale parameter 3, then plot the density of sample using ggplot.

z<- rgamma(n=5000, shape = 2, scale = 3)
head(z)
## [1]  4.551160  8.219747  2.544232 17.281052  7.778567  2.112637
data_z<- as.data.frame(z) #to use ggplot, it is needed a dataframe
library(ggplot2)
h<-ggplot(data_z, aes(x=z)) + 
  geom_histogram(color="violet", fill="pink", bins = 30) +labs(title = "Histogram of gamma distribution")


b<-ggplot(data_z, aes(x=z)) + 
  geom_boxplot(color="blue", fill="lightblue") +labs(title = "Boxplot of gamma distribution")

library(ggpubr)
ggarrange(h,b)

Conditionals

The syntax of if statement is:

If the test_expression is TRUE, the statement gets executed. But if it’s FALSE, nothing happens. test_expression can be a logical or numeric vector, but only the first element is taken into consideration. In the case of numeric vector, zero is taken as FALSE, rest as TRUE.

The else part is optional and is only evaluated if test_expression is FALSE. It is important to note that else must be in the same line as the closing braces of the if statement.

ifelse() function

ifelse(test_expression, x, y)

This returned vector has element from x if the corresponding value of test_expression is TRUE or from y if the corresponding value of test_expression is FALSE.

A)Generate a random sample of size 20 between 1 to 100. (Again, while generating use set.seed(361)) Then, count the number of even numbers in vector y.

y <- sample(1:100, size = 20, replace = FALSE)
y
##  [1] 50 19 74  7 91 25 73 71 59 51 13  2 53 17 78 48 40 64 46  3
count <- 0
for(i in 1:length(y)){
  if(y[i] %% 2 == 0)
    count = count + 1}

print(count)
## [1] 8
sum(ifelse(y %% 2 == 0, 1 ,0))
## [1] 8

Loops and functions

for loop

It is a type of control statement that enables one to easily construct a loop that has to run statements or a set of statements multiple times. For loop is commonly used to iterate over items of a sequence. It is an entry controlled loop, in this loop the test condition is tested first, then the body of the loop is executed, the loop body would not be executed if the test condition is false.

while loop

It is a type of control statement which will run a statement or a set of statements repeatedly unless the given condition becomes false. It is also an entry controlled loop, in this loop the test condition is tested first, then the body of the loop is executed, the loop body would not be executed if the test condition is false.

functions

A function is a set of statements organized together to perform a specific task. R has a large number of in-built functions and the user can create their own functions.

In R, a function is an object so the R interpreter is able to pass control to the function, along with arguments that may be necessary for the function to accomplish the actions.

The function in turn performs its task and returns control to the interpreter as well as any result which may be stored in other objects.

  • Function Name − This is the actual name of the function. It is stored in R environment as an object with this name.

  • Arguments − An argument is a placeholder. When a function is invoked, you pass a value to the argument. Arguments are optional; that is, a function may contain no arguments. Also arguments can have default values.

  • Function Body − The function body contains a collection of statements that defines what the function does.

  • Return Value − The return value of a function is the last expression in the function body to be evaluated.

A) Write a for loop to calculate the sum of the positive integers up to a given integer.(Let’s say it is 20)

total <- 0
for(i in 1:20){
total <- total + i
cat(paste("Round: ", i, "\t", "Total: ", total, "\n"))
}
## Round:  1     Total:  1 
## Round:  2     Total:  3 
## Round:  3     Total:  6 
## Round:  4     Total:  10 
## Round:  5     Total:  15 
## Round:  6     Total:  21 
## Round:  7     Total:  28 
## Round:  8     Total:  36 
## Round:  9     Total:  45 
## Round:  10    Total:  55 
## Round:  11    Total:  66 
## Round:  12    Total:  78 
## Round:  13    Total:  91 
## Round:  14    Total:  105 
## Round:  15    Total:  120 
## Round:  16    Total:  136 
## Round:  17    Total:  153 
## Round:  18    Total:  171 
## Round:  19    Total:  190 
## Round:  20    Total:  210

B) Create an 6x6 upper triangular matrix of 1s. (Hint: Firstly, create a zero matrix then fill 1’s with loops.)

\[ A = \begin{bmatrix}1 & 1 & 1 & 1 & 1 & 1 \\ 0 & 1 & 1 & 1 & 1 & 1\\ 0 & 0 & 1 & 1 & 1 & 1\\ 0 & 0 & 0 & 1 & 1 & 1\\ 0 & 0 & 0 & 0 & 1 & 1\\ 0 & 0 & 0 & 0 & 0 & 1 \end{bmatrix} \]

#Create A.
A <- matrix(0,ncol = 6, nrow = 6)
A
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    0    0    0    0    0    0
## [2,]    0    0    0    0    0    0
## [3,]    0    0    0    0    0    0
## [4,]    0    0    0    0    0    0
## [5,]    0    0    0    0    0    0
## [6,]    0    0    0    0    0    0
for(i in 1:6){
  for(j in i:6){
    A[i,j] = 1}}
A
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    1    1    1    1    1    1
## [2,]    0    1    1    1    1    1
## [3,]    0    0    1    1    1    1
## [4,]    0    0    0    1    1    1
## [5,]    0    0    0    0    1    1
## [6,]    0    0    0    0    0    1

C) Print integers from 1 to 8 and calculate 8! by using “while” loop.

#integers from 1 to 8
a <- 1
b <- 8
while(a <= b){ 
  print(a)
  a <- a + 1 #increase a by 1
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
#8!
a <- 1
b <- 8
res <- 1 # 0! = 1
while(a <= b){
  res <- res * a
  cat("step",a,":","\t","result =", res, "\n")
  a = a + 1
}
## step 1 :      result = 1 
## step 2 :      result = 2 
## step 3 :      result = 6 
## step 4 :      result = 24 
## step 5 :      result = 120 
## step 6 :      result = 720 
## step 7 :      result = 5040 
## step 8 :      result = 40320

D) Calculate the sum of \(1^p + 2^p + ... + 10^p\) for p = 1, 2, 3, 4, 5, 6 using while loop and print all results.

p <- 1 
while(p <= 6){
  a <- 1
  b <- 10
  total <- 0
  while(a <= b){
    total <- total + a^p
    a <- a + 1
    }
  cat("P:", p, "\t", "Total is", total, "\n")
  p <- p + 1
  }
## P: 1      Total is 55 
## P: 2      Total is 385 
## P: 3      Total is 3025 
## P: 4      Total is 25333 
## P: 5      Total is 220825 
## P: 6      Total is 1978405

E) Write a function to print the first 𝑛 elements of Fibonacci sequence (0,1,1,2,3,5,… etc, each number is the sum of the two numbers before it). Run this function for 𝑛=25.

n=25
fibonacci <- function(n) {
  x <- NULL  #an empty vector
  x[1] <- 0  #write 0 as the first element of x 
  x[2] <- 1  #write 1 as the 2nd element of x
  for (i in 3:n) {
    x[i] <- x[i-1] + x[i-2]
  }
  return(x)
}

F <- fibonacci(n)
F
##  [1]     0     1     1     2     3     5     8    13    21    34    55    89
## [13]   144   233   377   610   987  1597  2584  4181  6765 10946 17711 28657
## [25] 46368

SUMMARY

Statistical Probability Functions

The following table describes functions related to probability distributions. For random number generators below, you can use set.seed(1234) or some other integer to create reproducible pseudo-random numbers.